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Biological organisms adapt to changes by processing informations from different sources, most 
notably from their ancestors and from their environment. We review an approach to quantify 
these informations by analyzing mathematical models of evolutionary dynamics, and show how 
explicit results are obtained for a solvable subclass of these models. In several limits, the results 
coincide with those obtained in studies of information processing for communication, gambling 
or thermodynamics. In the most general case, however, information processing by biological 
populations shows unique features that motivate the analysis of specific models. 


I. INTRODUCTION 

Concepts from information theory are central to many quantitative studies of information processing in biol¬ 
ogy (1). In particular, the mutual information is commonly used to analyze input-output relationships in cellular 
processes such as biochemical sensing and transcriptional regulation (2-5). As a generic measure of information 
transmission, the mutual information has indeed a number of attractive mathematical properties (6). As a measure 
of biological information, however, it has several shortcomings: it does not account for the organization of cells into 
populations or for the role of inherited information and, more generally, its connection to evolutionary fitness may be 
questioned. How should the mutual information be amended to account for these features? Are such amendments 
always decreasing the value of information, thus conferring to the mutual information the role of an “ideal” up¬ 
per bound? Or can these amendments have a major incidence on the way information is optimally processed by a cell? 

A principled approach to these questions is to follow Shannon’s example (7) in defining and studying an abstract 
mathematical model that captures the essence of the problem of interest without directly (or axiomatically) prescrib¬ 
ing a formula for quantifying information. This formula is instead expected to emerge as a property of the model. 
We review here such an approach to the problem of formalizing information processing in growing populations (8). 
Because of similarities but also differences with engineering problems, this approach leads to measures of informations 
that are related but not identical to those obtained from models of communication. 

One crucial difference is that cells reproduce and form populations. This feature is common to problems of 
gambling and hnancial investment. The first analysis of the value of information in growing populations was in 
fact performed by Kelly in relation to horse-race gambling (9). He found that the mutual information emerges 
from the analysis of his model as it does from Shannon’s model of communication (7). His results were later 
extended to show that, in more general models, the mutual information provides only an upper bound on the 
value of information (6; 10). Several studies have pointed out the relevance of these results to biological popu¬ 
lations (11-13). In one of them (8), we analyzed two other generic limitations of the mutual information as a 
measure of the value of biological information: its failure to account for constraints of causality, which has also 
been examined in the context of gambling (14), and its failure to account for the distributed nature of biological 
information processing, where each individual cell processes its own information, which has no equivalent in gambling. 
This second feature implies that the value of information may exceed the value given by the mutual information (8; 15). 

Practically, deriving measures of information from abstract models is limited by the difficulty of analyzing 
mathematically models of sufficient generality. We show here how explicit formulae for the values of acquired and 
inherited informations in growing populations can be obtained for a class of solvable Gaussian models (16). Gaussian 
approximations are common in studies of information processing by biochemical networks (15; 17-19). Gaussian 
models of population dynamics have also their counterpart in several other fields. In information theory, they 
correspond to models of transmission of continuous signals in presence of additive white Gaussian noise (20). In 
population genetics, Gaussian models are at the foundation of quantitative genetics, which studies the evolution of 
continuous traits (21). In stochastic control theory, they are related to the Kalman filter, a tracking algorithm based 
on noisy measurements (22). In physics finally, we shall present a formal mapping to the problem of controlling by 
feedback a Brownian particle in a tunable harmonic potential. 

A more general connection between measures of information in growing populations and in stochastic thermo¬ 
dynamics was presented recently by Vinkler, Permuter and Merhav (23). Quantifying the value of information for 
controlling thermodynamical systems has been the object of many studies (24). Most of them follow the approach 
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FIG. 1 Discrete model - A. The environment is a stochastic process with two components: a selective pressure Xt and a cue 
yt- The selective pressure xt follows a Markov process with conditional probability |XoS'lid fhe cue yt derives 
from Xt with conditional probability Pyi x{yt\xt)- B. A member of the population at generation t receives two informations, 
an inherited type (j)t-i, which may differ from individual to individual, and an environmental cue yt, which is common to 
all individuals of the same generation t. From these two informations, the type 4>t is generated with conditional probability 
'x{<j)t\4't-i,yt)- The fitness of (pt given the selective pressure Xt decides the number ^ of descendants of the individual, with 
S{(j)t,Xt) representing the mean value of ^ given (j>t and xt- The descendants inherit the type (f>t of their ancestor and are 
themselves subject to the next environment (xt+i,yt+i)- At any given time, the composition of the population is characterized 
by the number Nt{4>t) of individuals of each type (pt- 


advocated here: a model is defined based on thermodynamical principles and a measure for the value of information 
is inferred from an analysis of its physical properties; for instance, this value is identihed with the maximal work that 
can extracted based on microscopic measurements (24). Given the different premises, it is all the more interesting 
to find that analogous formulae emerge when analyzing information processing in evolutionary dynamics and 
thermodynamics. 

The present work thus aims at connecting and extending different lines of work. In the first part, we review the 
problem of quantifying informations in a discrete model of growing population (8). Several aspects are common 
between this problem in gambling and in biology and we highlight the features specific to biological populations. In a 
second part, we show how this model becomes analytically solvable in a continuous limit. The Gaussian model thus 
dehned extends a model studied by Haccou and Iwasa (25) and can itself be extended to a more general model (16). 
In a third part, we present and develop an analogy to problems of stochastic thermodynamics (23), which we apply 
to Gaussian models. Finally, we conclude by discussing some open challenges. 


II. DISCRETE MODEL 

We start by reviewing the properties of a discrete model of information processing in growing populations (8). 


A. Definition 

The model considers a population of non-interacting individuals reproducing asexually in an independently 
varying environment (Figure 1). This environment is characterized by a state Xt, whose dependency on past history 
= (xi,... ,Xt-i) is represented by a conditional probability Px^\x*-'^ixt\x*~^) (we follow the convention of 
denoting random variables by upper-cases and values that they take by lower-cases). An individual at generation t 
is characterized by an internal discrete state, (pt, called its “type”, which determines its reproductive success. This 
reproductive success is quantified by S{(p,Xt), the expected number of descendants in the following generation, given 
the internal state (pt and the external state Xt- If R{^\(pt,xt) is the probability for an individual of type (pt and in 
environment Xt to have ^ descendants in the next generation (including itself) this reproductive success is thus given 
by S{(pt,xt) = {04>t,xt = C-R(?l</>i.a;t). 

The type (pt of an individual may depend on two things: the type (pt-i of its parent and a cue yt cor¬ 
related to the selective pressure Xt by a conditional probability PYt\Xt{yt\xt), which we assume to be fixed: 
PYt\Xtiyt\xt) = PYi\Xiiyt\xt) [also abbreviated PY\xiyt\xt)]- The ancestral type (pt-i represents an inherited 
information and the perceived signal yt an acquired information. The relationship between (pt, <pt-i and yt is 
generally considered to be stochastic, and characterized by a conditional probability '!T{(pt\(pt-i,yt)- This conditional 
probability tt encodes the information processing strategy followed by each individual of a population, each having 
its own (pt-i and (pt but experiencing the same Xt and yt- 

While the model can be studied more generally (8), we analyze it here under two simplifying assumptions: 
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(i) We assume that the environment is stationary, ergodic and Markovian, with Pxt\x-t-^{xt\x* = 
(ii) We assume that S{(f)t,Xt) is of the form 

S{4>t,xt) = K{xt)A{xt\4>t) with A{xt\(l)t) > 0 and A{xt\(j)t) = 1. (1) 

Xt 

This assumption means that no type 4>t has a systematic advantage when considering all possible environments Xt (25). 
(Here and below, a notation of the type A{u\v) always signihes that H is a transition matrix, with H(M|r;) > 0 and 
A{u\v) = 1 for all v.) 


B. Fitness and optimality 

The dynamics of the model is summarized by a recursion for Nt((j)t), the expected number of individuals of type (jit 
at generation t. 


0t-l 

where the series of environmental states = {xi ,..., Xt) and cues y* = (j/i,..., yt) are considered as externally fixed. 

Quantifying the values of the inherited information and acquired information yt requires a well-defined fitness 
function. This fitness function should indicate the outcome of natural selection when two populations with different 
strategies tti and 7r2, defining two “species”, are competing. As this outcome may be stochastic, such a fitness 
function need not exist (or may depend on the particular realization of the stochastic processes). For our simple 
model, however, a population will, in the long term, either become extinct or grow exponentially. In the second case, 
the rate of exponential growth. A, depends on the strategy tt, the selection S', and the environmental parameters 
Pxi\Xo Slid Pvqxu but not on the particular realization of the dynamics [mathematical details may be found in (8)]. 
This growth rate thus defines a fitness function to compare the long-term value of different strategies tt. 

More precisely, the growth rate is given by the limit 


A = lim - In , 
t->oo t Nq 


( 3 ) 


where Nt = 'Yhcht represents the expected total population size at generation t. If the environment is stationary 

and ergodic, which we shall assume, A can also be written as 


A = E[lnWi], (4) 

where Wt = Nt/Nt-i represents the factor by which the population size is multiplied between two successive 
generations, and E is an expectation with respect to the external random variables A* and Y*. 


A(7r) defines a relevant measure of fitness in the sense that, in the long run (t —)■ oo) and all other things being 
equal, a population following strategy tti will almost surely exponentially out-number a population following tt 2 if 
and only if A(7ri) > A(7r2) (provided the population does not become extinct). An optimal strategy tt can therefore 
be defined as a strategy optimizing A(7r). 


C. Informations 

We define the value of an information as the increment of fitness that it may confer. This involves a comparison 
between the growth rate of two models, one in which the information is available, and one in which it is not. 
Mathematically, no information can be acquired when tt is of the form and no information is inherited 

when it is of the form 'K{(j)t\yt)- More generally, let Vo be a subset of the set Vi of admissible strategies in which tt is 
prevented from accessing a particular information. Then we define the value of this information as 

I = max A(7r) — max A(7r). 


( 5 ) 
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FIG. 2 Constrained information processing - The conditional probability which decides the type (pt of 

individual given the inherited type and the environmental cue yt (Figure 1) may be constrained. A. Replication may 

be subject to mutations such that the individual effectively inherits (p't-i with conditional probability Sensing 

may be subject to noise such that the individual effectively perceives with conditional probability C{tpt\yt)- Given these 
two elements, an individual generates its type (j>t with conditional probability 7ro(<(<t|((i(_i,r/’t). While the environmental cue 
yt is common to all members of the population at a given generation t, the perceived signal ijjt may differ from individual to 
individual. B. The type may have two components: a phenotype 4>t which decides the number of descendants via S{<f)t,Xt) and 
a genotype 7 t which defines the information transmitted to these descendants. The first may be described by a conditional 
probability H{-yt\yt-i) and the second by D{(f>t\'ft-i,yt), where ■yt-i represent the inherited genotype, which, by definition, is 
the only component of the type to be inherited. 


In particular, the value of acquired information /acquired is defined by considering the subset Vq of strate¬ 
gies of the form 7 r(^t|^t_i), and the value of inherited information /inherited of the form Tr{(j)t\yt)- By taking 
for Vo the subset of strategies of the form 'n{(j>t), we also define the joint value of the two informations, 
/tot, which is generally not the sum /acquired + /inherited, since hot = A( 7 r) - max^(^^) A( 7 r) 7^ 

A(’r) + “ max^(0t|yt) K'^)- 

Additional constraints may be present that restrain Vo and Vx to a subclass of admissible strategies. For 
instance, the transmission of inherited information may be noisy because of random mutations following replication, 
with TT necessarily of the form Ti{4>t\4>t_i,yt-i) = where M is a 

given mutational matrix, and where only the conditional probability TToicj^tWt-i^yt) is subject to optimization 
(Figure 2A). This corresponds to replacing Eq. (2) by where 

represents the number of individuals mutated to 

Similarly, the acquisition of an information from the environmental variable yt may be limited by a noisy sensor 
C'(^/>t| 2 /t), with TT constrained to be of the form TT{4>t\4>t-i,yt) = ■0i)C'(V't|j/i) (Figure 2A). This 

constraints introduces a distinction between two types of informations: yt, which is a feature of the environment 
and is common to all members of the population at generation t, and ipt, which is associated with a particular 
individual (we use Roman letters for environmental variables and Greek letters for individual variables). For 
instance, yt may represent the concentration of one of several constituents of the environment, related to Xt by 
PY\x{yt\xt), and f/'t the concentration of this constituent as perceived by a particular individual, given its noisy 
sensor Cliptlyt)- The cue yt and the sensor C are common to all individuals but not necessarily the perceived 
signal xpt- This decomposition may be viewed as the counterpart at a population level of the decomposition between 
extrinsic and intrinsic noise at the individual level (26): as intrinsic noise corresponds to intra-individual variations 
and extrinsic noise to inter-individual variations in gene expression, the intrinsic information ipt corresponds to 
intra-generation variations and the extrinsic information yt to inter-generation variations in information sensing. 
This distinction becomes important when evaluating the value of the information provided by the sensor C(Tpt\yt), 
as opposed to the value of the information provided by the “environmental channel” Px\Y{yt\xt) (see examples below). 

Another biologically motivated constraint on tt is the decomposition of the type of an individual into a genotype, 
which is inherited and transmitted, and a phenotype on which selection acts. A generic model making this distinction 
is for instance defined by the recursion 

Nt{lt)= X! S{(j)t,xt)H{-ft\-ft-i,(j)t,zt)D{(j)t\-ft-i,yt)Nt-i{'^t-i), (6) 

04 . 74-1 

where D{(j)t\^t_i,yt) specifies how the phenotype (j)t stochastically depends on the inherited genotype ^t-i and some 
aspect yt of the environment and H('^t\lt-i,4‘t, Zt) how the transmitted genotype 74 depends on the inherited genotype 
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and some possibly different aspect Zt of the environment (Figure 2B). As shown in Appendix A, this model corresponds 
to Eq. (2) when tt is constrained to a particular set of admissible strategies. The model defined by Eq. (6), however, 
has two acquired informations: yt at the phenotypic level and Zt at the genotypic level (which each may be decomposed 
into extrinsic and intrinsic contributions). This extension corresponds to a discrete version of the model proposed 
in (16) and illustrates the fact that multiple acquired informations may be dehned and quantihed. Similarly, the model 
can be extended to deal with multiple inherited informations, for instance to represent a genetic and an epigenetic 
contribution to heredity. 


D. Solvable limits 

In two limits, Eq. (2) factorizes into a recursion that involves only the total population size Nt = Nt{(j)t). The 

first limit is when the environment is maximally selective, so that only one type (j)t, which may be defined without 
loss of generality as (jit = Xt, can survive in each environmental state Xt, 

S{(j>t,xt) = K{xt)6{xt,(j)t) [perfect selectivity] (7) 

where K{xt) represents the multiplicative rate of the surviving type, and 5 denotes the Kronecker symbol, with 
5{xt^ 4>t) = 1 if = Xt and 0 otherwise. This corresponds to A{xt\4>t) = 5{xt, (pt) in Eq. (1). In this case, Nt = Nt{xt) 
and 


Nt = WtNt-i with Wt = K{xt)n{xt\xt-i,yt)- (8) 

The second limit is in absence of inheritance, when the current type (pt of an individual cannot depend on its ancestral 
type (pt-i, 


which implies 


'7r{(pt\(pt-i,yt) = T^{(pt\yt) [no inheritance] 

Nt = WtNt-i with Wt = S{<pt,xt)7:{(pt\yt)- 


(9) 


( 10 ) 


Given the assumption made in Eq. (I), this may be rewritten as Nt = K{xt)n{xt\yt)Nt-i, as in Eq. (8), but with an 
effective strategy tt defined by 


^ixt\yt) = '^A{xt\(pt)TT{(pt\yt)- ( 11 ) 

<Pt 

The effective strategy tt is here constrained to a particular subset Vi, as in the examples discussed above. 

The conjunction of the two limits, perfect selectivity and no inheritance, dehnes Kelly’s model (9), where 

Nt = WtNt-i with Wt = K{xt)TT{xt\yt), (12) 

and therefore 

A = E[lnIFt] =Ex[ln7f(A)]+Ex.yln[7r(A|r)]. (13) 

where ExlnAr(A) = Px{x)lnK{x) with Px{x) = Pxt{x) describing the probability oi Xt = x (since the envi¬ 
ronment is assumed to be stationary, it is independent of t), and where Ex,y [7r(A|F)] = y Px,Y{x,y)\Ti[TT{x\y)] 
with Px.Y(x,y) = PY\x{y\x)Px{x) describing the joint probability of {xt,yt) = {x,y). 

In the original formulation of this model (9), Nt is a capital that a gambler bets on successive horse races and 
Xt represents the horse winning on race t, K{(pt) the odds for horse (pt (the ratio of the full payout to the stake if 
it wins) and yt a side-information hinting at the identity of Xt- The betting strategy TT{(pt\yt) dehnes the fraction 
of capital bet on each horse (pt given the information yt, which the gambler wants to choose so as to maximize its 
cumulative gain Nt = nl=i 114 A"o- In this interpretation, an individual corresponds to a particular unit of currency, 
say a 1€ coin, and the “type” of a coin to the horse on which it is bet. 
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The analogy extends to models with finite selectivity, corresponding to multiple horses having non-zero return and 
to models with inheritance, corresponding to a gambler with memory (6). Some aspects of information processing 
in biological population have, however, no analogy in gambling, such as the distinction between extrinsic and 
intrinsic informations. Information processing is indeed centralized in gambling, where a gambler controls each of 
its coins, while it is distributed in biology, where each member of a population can act independently and stochastically. 

In the following two sections, we summarize the properties of the model in each of the two generally solvable limits 
of no inheritance and perfect selectivity, before introducing a continuous model that can be solved beyond these two 
limits. We refer to (6; 8; 14) for a derivation of the results. 


E. No inheritance 


Assuming no inheritance, i.e., tt constrained to the form ■n{<j)t\yt)-i we can write the growth rate as (see Appendix B) 

A = A* - H{X) + I{X- Y) - EY[D{Px\Yi-\Y)\m.\Y))], (14) 

where tt is the effective strategy defined in Eq. (11). In this decomposition, each term has an interpretation of its 
own (6): 

• A* = Ex \nK{X) = Px{x) h\K{x) corresponds to a maximal growth rate, possibly achievable only if knowing 
exactly the sequence of environmental states; 

• H{X) = —^xPx{x)\nPx{x) is the entropy of Xt, and represents here a cost due to the stochasticity of 
environmental process; 

• I{X;Y) is the mutual information between the cue T* and the selective variable Xt, defined by 

= (15) 


where Pxiy) = PY\x{,y\x)Px{x) is the probability of yt = y. It can also be written I{X-,Y) = H{Y) — H{Y\X) or 
I{X;Y) = H{X) — P[{X\Y) if introducing the conditional entropy H{Y\X) = — Yl,xy ^x,Y{x,y)\x\PY\x{y\x)- The 
mutual information represents here a gain due to the information about Xt that is contained in Yt and is zero if and 
only if Xt and Y* are independent random variables; 

• Ey [Zl(Px|y (■|^)||fr(.|T))] = Yl,y PY{y)D{Px\Y{-\y)\\^{-\y)) represents the cost of following a suboptimal strategy. 
It involves a relative entropy, which is generally defined between two distributions P{x) and Q{x) as 




(16) 


D{P\\Q) > 0 and D{P\\Q) = 0 if and only \i P = Q. It also involves Px\y^ the conditional probability of Xt given 
Yt, which by Bayes’ rule is given by 


Px\Y{x\y) 


PY\x{y\x)Px{x) 

PY{y) 


(17) 


Since tt appears only in the last term of Eq. (14), which is necessarily non-negative, the optimal growth rate is 


A = A* - H{X) + liX; Y) - mmEY[D{Px\Y{-\Y)\\ni.\Y))]. 

TT 


(18) 


In computing the minimum, two situations may arise. If the equation tt = Px\y has a solution in tt, then this solution 
optimizes the growth rate by reducing to zero the relative entropy term, and A = A* — H{X) + I{X; Y). Otherwise, 
A < A* -H{X)+I{X-,Y). 


When considering the value of acquired information, the optimal growth rate in absence of information, A = 
A* — H{X) — min^ D{Px\\’fT), must also be evaluated [minimum over Vq in Eq. (5)]: 

/acquired = I{X-,Y) - min Ey [Z7 (Px|y (■!>") II^(-!>"))] + minDiPxim. (19) 

TT TT 

Since tt = Px has a solution whenever tt = Px\y has a solution 7r(x|y) [given by tt{x) = '^y 7r(a;|?/)Py (y)], three cases 
must be considered: (i) tt = Px\y has a solution (implying that tt = Px has one); (ii) tt = Px has a solution but not 
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TT = Px\Y'i (iii) ^ = Px has no solution (implying that tt = Px\y has none). In the first case, /acquired = I(X]Y), 
while in the two others /acquired < as may be proved even without assuming Eq. (1) (6). 

In any case, the value of acquired information is bounded by a mutual information, /acquired < I{X]Y). This 
mutual information, however, is between the selective pressure Xt and the cue Yt, both environmental variables, and 
not between the input Y* and the output 'I't of the sensor of a particular individual. The mutual information /('I'; Y) 
can indeed exceed /acquired as shown explicitly with a two-state model in (8) and with a Gaussian model below. The 
value of acquired information in presence of a sensor with noise C{'tl)\y) is 

/acquired = I{X; Y) - minEy [7/(Px|y(• I>")II^ * C'(.ly))] + min7/(Px||^), (20) 

TTO TT 

where tt * C{x\y) = ^/S.{x\(j))TTQ{(f)\'ip)C{'tp\y). A sensor with a given noise C{'tp\x) is in fact always more valu¬ 

able than an environmental channel Py\x with same noise (8). This is most simply illustrated with a model 
with perfect selectivity, as described by Eq. (13). In this case, I* = Xt implies A = A* -|-Ejc ln7r(X|A) with 
'k{x\x) = '^^T:(f{x\il))C(%p\x) = E,j,|jf^3,7ro(x|5') and, by concavity of the logarithm, 

A = A* + Ex\nE^^xMX\'it) > A* + Ex,^\mTo{X\-i>). (21) 

The right-hand side corresponds to the growth rate of a model with = Yt, where 1) is given by PY\xiyt\xt) = 
C{yt\xt). This inequality is analogous to the statement in statistical mechanics that the quenched free energy of a 
disordered system is bounded from below by the corresponding annealed free energy. It represents here the benefice 
of multiple distributed sensors over a single centralized sensor with same noise. 


F. Perfect selectivity 

In the other limit of perfect selectivity, an expression formally similar to Eq. (14) can be written 

A = A*-//(Ai|Ao)+/(Ai;ri|Ao)-Ex„.yJ//(Pxqxo,n(.|Ao,ri)||7r(.|Ao,ri))], (22) 

where a conditioning on the past environment Xq needs to be added (and where tt replaces tt). Here, 
the conditional mutual information /(Ai;Fi|Ao) is defined by /(Ai;Fi|Ao) = H{Yi\Xo) — H{Yi\Xi) [since 
i/(yi|Ai,Ao) = i/(ri|Ai)]. 


The optimum growth rate is obtained for tt minimizing the last term of Eq. (22). In absence of constraints, it is 
reached for X = Pxi\Xo,Yi, corresponding to A = A*-//(Ai|Ao)-l-/(Ai; Yi|Ao). In this case, /acquired = I{Xi;Yi\Xo). 
Since /(Ai; Yi|Ao) = /(Ai;Fi) — /(Ao;Ai), the difference with the instantaneous mutual information I{Xi;Yi), is 
exactly /(Aq; Yi), the value of the cue Yt that is already contained in the knowledge of the past environmental state 
Xt-i- More generally, with constraints, the last term may not vanish and /acquired < /(Ai; Yi|Ao). 

The value of inherited information is read from another equivalent decomposition of the growth rate where Aq and 
Yi, which play similar roles, are formally exchanged: 

A = A*-//(Ai|Yi)+/(Ai;Ao|ri)-Exo.yr[//(Pxr|x„.yr(.|Ao,Ai)||7r(.|Ao,ri))]. (23) 

This implies /inherited < /(Ai;yi|Ao), where the conditional mutual information /(Ao;Ai|Yi) takes into account 
that some of the information contained in Xt-i is also present in Yt. 

Finally, the total information conferred by the two sources of information satisfies 

hot < /(//i; Ao) + /(Ai; AilAo) = I{X,;Y,) + /(Aq; Ailn), (24) 

with equality in absence of constraints. 

In presence of inheritance, the role of the mutual information is thus played by a conditional mutual information. The 
conditional mutual information /(Ai; Yi|Ao) not only differs from the instantaneous mutual information /(Ai; Yi), 
but also from the rate of path/trajectory mutual information, which is defined from the mutual information /(A*; A*) 
between the processes A* = (Ai,...,Ai) and A* = (Ai,...,At) as limt_>oo/(A*; A‘)/t. The difference becomes 
apparent when applying the chain rule (6) to write /(A*;A‘) = -^(Afe; A‘|Afc_i) since /(A^; A‘|Afc_i) > 
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I{Xk',Yk\Xk-i) = I{Xx]Yi\Xq), with, in general, a strict inequality. This inequality accounts for a constraint of 
causality: an individual has access at time t to the present cue yt, but not to future cues yk with k > t, which 
could allow for a better estimation of Xt if available. These considerations extend in non-Markovian environments to 
strategies of the form ,y*), where an individual has access to past cues yk with k < t (14). The value of 

acquired information then corresponds to the more general concept of directed information, denoted I(Y X), which 
appears repeatedly in problems of feedback control where constraints of causality are involved (27). The conditional 
mutual information I{Xi; Yi\Xq) is the particular value taken by the directed information I{Y ^ X) when considering 
stationary, Markovian stochastic processes. The directed information generally differs from the transfer entropy, also 
proposed to quantify the causal relationships between stochastic processes (28; 29). 


III. GAUSSIAN MODEL 

We now present a continuous limit of the discrete model for which the growth rate A can be computed analytically 
beyond the two cases of perfect selectivity and no inheritance. 


A. Definition 


A model with continuous traits 0* € M is defined by replacing Eq. (2) with 

1 


nt{(l)t) = 


Wt 




(25) 


where represents the density of individuals with trait 4>t in the current population, with nt{(f>t) > 0 and 

/ d(j)t nt{(j)t) = 1- The function S{<j)t,xt) is chosen as in Eq. (1) to be of a factorized form 


S{(l)t,xt) = K{xt)Ga2{(j)t - Xt), 


(26) 


where Ga 2 [x) = (27r(j^) exp(—x^/2cr^) represents a generic Gaussian function and K{x) > 0 is arbitrary. We 
parametrize tt as 


TT{(l)t\(j)t-i,yt) = G„2^{(j)t - Mt-i - Kyt), (27) 

where a'^ quantifies the degree of stochasticity, A the contribution of the inherited information and k of the acquired 
information (it can be shown that the optimal tt is necessarily of this form). 


The growth rate A associated with this model can be computed analytically for different environmental processes, 
but we consider here a stationary Markovian Gaussian process, i.e., a discrete Ornstein-Uhlenbeck process: 


Pxi\Xo{xt+i\^t) — G„2 {xt+i — axt), (28) 

XI I XQ 

where o < 1 parametrizes the temporal correlation between successive environments and = ]E[Alj|A'o] the 

amplitude of their variations. This interpretation follows from noticing that E[AriAri] = and = E[Ar^] = 
(T^il^^/(1 — a^). Finally, we take a Gaussian channel for FVi|Xi = Py\x' 

PY\x{yt\xt) = G „2 {yt - Xt), (29) 

yi|xi 

where = E[F]^|Ari] represents its noise. For independent environments (a = 0), this model was studied in (25). 

The growth rate A for this model can be computed analytically (see Appendix D): 


A A * 1 , /« 2a 1 , 01 

A = A - - M2„.) + -U-- [ 


(A^ + (1 — k)^)( 1 + oa) — 2A(1 — «:)(a + a) 2 22 


(1 — aa)(l — a?) 




(30) 


where 


2A 


/3 = 


a = 


1 + A2 + /3 + ((1 - A2 - /3)2 + 4/3)i/2 


(31) 
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and A* = Ex \aK{X). The model has seven parameters, four to describe the environment, (Xg for the selectivity of 
the environment, a for the correlation between successive environments, for the amplitude of their fluctuations 

and for the (extrinsic) noise of the cue, and three to describe the strategy tt: ctJ, A and k. 

The two limits of no inheritance and perfect selectivity correspond, respectively, to the limits A —> 0 and —)■ 0. 
We show below how, in these limits, the growth rate of this continuous model has a decomposition similar to the 
decomposition of the growth rate of the discrete model. With the continuous Gaussian model, however, explicit 
formulae for the values of information can be obtained even when they do not coincide with a mutual information. 
Models with constraints and not assuming any of these limits can also be treated in this same framework (16) [see 
Appendix G for the link between this model and the model in (16)]. 


B. No inheritance 


In absence of inheritance (A = 0), Eq. (30) becomes (see Appendix E): 

A = A* - h{X) + I{X-, Y) - EY[D{Px\Y{-\Y)\\ni.\Y))], (32) 

where 7r(a;|y) = G „2 * tt {x\y) = f d(/> G^ 2 {( 1 > — x)TT{(j)\y) = G^ 2 j^^ 2 {x — Ky) represents an effective strategy as in 
Eq. (11), and where 

l+4^)- (33) 

The only difference with Eq. (14) is the presence of a differential entropy h{X) instead of the entropy H{X). The 
differential entropy is generally defined for continuous random variables as h{X) = — f dxPx(x) InPx(x). While the 
mutual information I(X;Y) = h{X) — h{X\Y) corresponds to a limit of discrete mutual informations when X is 
discretized into an increasing number of bits, the discrete entropy diverges in this limit, and the differential entropy 
h{X) represents only the non-diverging part (6). This divergence is compensated here by the divergence of S{(j)t,xt) 
when (Tg —> 0 [see Eq. (26)]. 


KX) = - ln(27recr^J, 


and 


IiX;Y) = -ln 


The Gaussian model has the advantage over the discrete model that the value of acquired information /acquired 
given by Eq. (19) can be evaluated explicitly. If assuming that tt is not subject to any additional constraint, three 
cases must be distinguished (25): 

(i) if where ^ the two equations Pxjy = G^2 * tt and Px = G^i * tt have a solution, 

respectively given by dj = k = 1/(1 -b cr^|g,/o-^), and = cr^ — (Xg, d = 1; in this case. 


/acquired — /(A^: M) — 2 


1-b 




y\^, 


(34) 


(ii) if a^^y < a1 < Px = G„2 *7r has a solution but not Px\y = *7r, and D{Px\Y\\Gai *7r) > 0 is minimized 

with dj = 0, d = 1/(1 -b (x^|g,/a-^); in this case, 




/acquired = /(A; T) - D{G,2 ||G,.) = if In ^ , + 1 1 . 

2 aj (CT^|g,+ CT/)cr/ 


(hi) if (x^ < (Xg, neither Px\y = G„2 * tt nor Px = G^2 * tt have solutions and 

/acquired = /(A; F) — /1(G,„2 \\G ^2) -\-D {G „2\\G ^2) = - y 

= 2(a^l^-b(x^)CT/ 


(35) 


(36) 


This formulae show how the value of information can depend on the degree of selectivity (Xg of the environment, in 
addition to the ratio signal/noise fx^/(Xy|^ that controls the mutual information (Eigure 3A). 


These different cases are associated with qualitatively different optimal strategies: (i) corresponds to an effective 
Bayesian strategy, tt = Px\y, but (ii) and (iii) to a deterministic response, = kyt, also known as a “pure strategy” 
in game theory. This later case is an example where a Bayesian inference of xt given yt is pointless: the optimal 
strategy is simply to act as if the information was noise-less, with only the multiplication factor d to account for the 
presence of noise. 
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^ (mutual info) 


— a = 0 (mutual info) 


1 

II 


— a = 1/2 


intrinsic info 

— = 1/2 


_ a = 3/4 


extrinsic info (mutual info) 





FIG. 3 Value of the acquired information in different limits of the Gaussian model - A. No inheritance but hnite selectivities 
(Ts, with = 0 (red curve) corresponding to the limit of perfect selectivity where the value of acquired information is given by 
the mutual information I(X; Y) = (1/2) ln(l + (T^/cry|^). A hnite selectivity leads to smaller values for the acquired information 
(blue and green curves). Here cr^ = E[X/] represents the variance of the selective pressure Xt and = E[y(^|X(] the noise 
in the environmental cue yt- B. Perfect selectivity but inheritance, with a = 0 (red curve) corresponding to the limit where 
inheritance has no value because the environment has no temporal correlations. In this case, and in this case only, the value 
of acquired information is given by the mutual information I{X;Y) (red curve), otherwise it has a lower value (blue and 
green curves). C. Extrinsic versus intrinsic informations, with no inheritance and perfect selectivity, but with a possibly noisy 
individual sensor C{ijjt\yt) (as in Figure 2A). When the sensor is noise-less (i/’t = yt) but the cue yt has a noise the 

value of acquired information is given by the mutual information /(X; Y) (red curve, as in A and B, but note the difference of 
scale along the y-axis). When the cue is noise-less {yt = Xt) but the sensor has a noise the value of acquired information 

is higher (blue curve). 


C. Perfect selectivity 

In the limit of perfect selectivity al —)■ 0, we verify that 

A = A* - h(Xi|Xo) + /(Xi; nlXo) - EY[D{PxiYi-\Y)\\7T{.\Y))], (37) 

which is similar to Eq. (22) but with a conditional differential entropy h{Xi\Xo) instead of the entropy H{Xi\Xq). 
We have explicitly (see Appendix C): 

MXi|Xo) = iln(27re<|,J, /(Xi; 1/|Xo) = ^ In ( 1 + (38) 

Given Eq. (33) and ~ verify that J(Xi;Fi|Ao) < I{Xi;Yi), with a strict inequality 

if successive environments are non independent (Figure 3B). 


If TT is not constrained, the optimal strategy is tt = Pxi\Xo,Yi^ which corresponds to (see Appendix C): 

I 


K = 


1 -b cr^ I /(T^ 


A = a(I — k), = ka. 


\xo 


xi\xo' 


(39) 


While the value of acquired information is determined by I{Xi; Yi|Ao), the value of inherited information is determined 
by 


/(W; A'olU) = 1 k ( 1 + ) = 1 1„ ( 1 + 


xo\xt 


(40) 


xt\xo , 


Finally, the total value of the two informations, given in Eq. (24), is at most I{Xi\Yx\Xq) + I{Xq\Xi), i.e., 


Aot — X 111 1 + 


xi\xo 
yi\xi , 




(41) 


This formulae show how the value of acquired information depends on the presence of inherited information when the 
successive environment are correlated (a > 0). 
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D. Common and individual informations 

The formulae presented so far assume the absence of constraint on tt. They have to be corrected in presence of a 
noisy individual sensor, as shown by Eq. (20) in absence of inheritance. To illustrate this case in the simplest setting, 
we assume here both an absence of inheritance (A = 0) and an perfect selectivity (cr^ = 0), in which case Eq. (20) 
becomes 


/acquired = /(Af;E)-minEy[fl(P;,|y(.|y)||G,2+,.,2 - kY))] (42) 

since min^r D{Px\\tt) = 0 with tt = Px- Two cases must be distinguished: 

(i) if where kq = 1/(1 + and the equation Px\Y{x\y) = “ ^^2^) 

has a solution given by ctJ = cr^|^ - and /acquired = I{X;Y). 

(ii) if > ^x\y^ '"^6 liave necessarily D{Px\y\\'’^ * ^ ^ = 0 bat, generally, k ^ kq- 

An illustration of this second case is provided by a model where yt = Xt but ipt ^ Xt, i.e., = 0 and (t^|^ = cr^|„,. 

For this particular model, the value of acquired information is (see Appendix F): 

/acquired = ^ (k - ln(l - k)) , with K = ^ and (=-^. (43) 

^tp\x 

This formula shows that the value of acquired information can be strictly larger than the mutual information between 
the input and output of the sensor C, since 

/acquired > I{X; 4-) = ^ ln(l + C) (44) 

with equality if and only if ^ = 0 (Figure 3C). 


IV. FROM EVOLUTIONARY DYNAMICS TO THERMODYNAMICS 

The problem of formalizing and quantifying the notion of information also lies at the foundations of thermody¬ 
namics. As pointed out by Maxwell in a famous thought experiment, an intelligent being may take advantage of 
microscopic measurements to extract work from a single heat bath, in apparent contradiction with the second law 
of thermodynamics (30). Maxwell’s demon is today at the center of an active field of research, stochastic thermody¬ 
namics, where many results involve information theoretic quantities (24). Recently, Vinkler, Permuter and Merhav 
showed that the two problems of optimizing the growth rate of a population and optimizing the work extracted from 
a feedback-controlled thermodynamical system are formally related (23). Here, we present and develop this analogy, 
first with a simple two-state model, then with more generic discrete and Gaussian models. 


A. Simple two-state system 

As one of the simplest thermodynamical systems with feedback control, we consider a model where a particle can 
be in two states, either “down” in potential F = 0 or “up” in potential V = AE > 0 (Figure 4). The particle is 
initially at thermal equilibrium with a heat bath at inverse temperature /? = l/iksT), so that it has probability 
Po = 1/(1 + to be in the down state, and probability pi = 1 — po to be in the up state. At regular intervals of 

time r, long compared to the equilibration time, a demon can chose to suddenly switch the two levels, thus bringing 
down the particle if it was up and up if it was down. In doing so, he can extract a work W = +AE if the particle 
was in the up state, while losing W = —AE otherwise. In absence of information on the location of the particle, the 
expected outcome of the operation is E[yV’] = (pi — po)AE < 0, a negative result in agreement with the impossibility 
to extract work from a single heat bath. If the demon knows exactly the location of the particle, on the other hand, 
he can decide to switch the potential only when the particle is in the up state. As it happens with probability pi , he 
can thus expect to extract a positive work, E[yV’] = piAE > 0. In the intermediate situation, which we now examine, 
the demon makes a noisy measurement of the location of the particle and must devise a strategy to optimize the 
extracted work. 

To formalize the problem, let denote by x the state of the particle at the time of a measurement, with a; = 1 if 
it is in the up state and x = 0 otherwise (Figure 4). Immediately before the demon makes a decision to switch or 
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W = +AE 
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-► 
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X = 1 

W = -AE 

x = 0 
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o 

y = 0 
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FIG. 4 Optimal control of a two-level system - A particle at equilibrium with a heat bath can be in two states; an up state 
with energy +AE {x = 1) or a down state with energy 0 (x = 0). A measurement is made which indicates, with an error rate 
e < 1/2, whether the particle is up {y — 1) or down {y = 0). Based on this measurement, a demon can chose to switch the two 
levels, thus extracting a work W = +AE if the particle was up and performing a work W = —AE if it was down. To extract a 
maximal work in average, the optimal strategy of the demon is to switch the two levels if and only if the particle is measured 
in the up state {y = 1), as indicated in the bottom. 


not the potential, the particle has thus an energy Eq{x) = xAE. If (/> = 1 denotes the choice to switch the potential 
and (/) = 0 the choice to leave it unchanged, the energy of the particle after making and implementing choice (f) is 
Ei{x\(j)) = \(j) — x\AE and the extracted work is W{x,(j)) = Eo{x) — Ei{x\(j)) = {2x — l)cj)AE. Let now consider the 
outcome y of a measurement of x, whose noise is characterized by a conditional probability PY\xiy\x)', for instance, 
X = y with probability 1 — e, but x = 1 — y with an error rate e (binary symmetric channel). A strategy choosing cj) 
given y with probability p(</>|y) will extract a mean work 

E[W] =Ex,FE$[W(X,d>)|y] = ^ Px{x)PY\x{y\x)p{cl^\y)yV{cl>,x), (45) 

where Px{x) = JZq is the equilibrium distribution that describes the particle at the time of the measurement, 

with Zq = 1 + . Because of the linearity of Eq. (45), the question of hnding a strategy p{(j)\y) that optimizes 

the mean extracted work E[yV] has a trivial answer: it is simply to switch the potential ((/> = 1) if and only if the 
state a: = 1 is the most likely given y. With a binary symmetric channel with error rate e < 1/2, this corresponds to 
the pure strategy </> = y, i.e., p{4>\y) = ^(.4>^y)i results in 

E[W] =Ex,y[W(X;y)|y] =^Px(x)Py|x(?/k)W(?/,x). (46) 

x,V 

More generally, the outcome of a measurement determines the optimal decision, (j) = (j){y), and without loss of 
generality we can assume that the signal directly indicates the optimal choice, y = (j). 

An analogy with models of population dynamics arises when introducing the conditional probability T:{x\y) = 
e“^^i(“l^)/Zi(y), where Zi{y) = (23). Because Zi(y) = = Zq for all y, we can indeed 

write the extracted work as 


yV{x,y) = Eo{x) 


Ei{x\y) = /3 Mn 


T^{x\y) 

Px{xY 


(47) 


and, after averaging. 


E[>V] = Ex,y 


In 


<x\Yy 

Px{X)_ ■ 


(48) 


Up to a multiplying factor /3, this expression is formally identical to the expression for the growth rate A of a discrete 
Kelly model given in Eq. (13), with K{x) = l/Pxix). This particular value of K{x) has a simple interpretation in 
gambling: it defines a fair game, with A = Ex In K(A) — (A) = 0. From the standpoint of Kelly’s model, the choice 

of a potential Ei{x\y) thus appears as the choice of a strategy. Following Eq. (18), the mean extracted work satisfies 


/1E[W] = I{X-Y) - Ey[Z4(Px|y(.|y)lk(-|i"))]- (49) 

Irrespectively of the measurement scheme, the extracted work is therefore bounded by the mutual information between 
the actual and measured locations of the particle: /3E[>V] < I{X;Y). 
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B. General discrete systems 

Reaching the bound /3E[yV’] = I{X]Y) requires a potential Ei{x\y) verifying = Px\Y{x\y)- This 

potential, however, need not satisfy Zi{y) = Zq for all y. Introducing the free energies Fq = —/3“^lnZo and 
Fi{y) = \Q.Zi{y), the expression for the extracted work when the particle is in x and the measurement indicates 

y, Eq. (47), generalizes to 

yVo{x,y) = Fo{x) - Fi{x\y) = I3~^ In + Fq - Fi(y). (50) 

Px{x) 

In average, the demon will thus extract 

E[Wo] = r' {I{X-,Y) - Ey[F(Px|y(.|r)||7r(.|r))]) - AF, (51) 

where AF = Ey[Fi(y)] — Fq. This quantity is analogous to a difference of free energies, but note that the state of 
the system immediately after the operation is generally not be an equilibrium state. Eq. (51) implies the inequality 

E[Wo] < Y) - AF. (52) 

This inequality corresponds to a known generalization of the second law of thermodynamics in presence of feed¬ 
back (31). It is more frequently written E[yV’(C] — AF > —/3~^I{X;Y), where Wq = —Wo is the average work 
performed on the system (24). We follow here the opposite convention of counting positively the extracted work for 
consistency with the sign of the growth rate in the evolutionary model. 

To define a cyclic process, the particle needs to be brought back to equilibrium in Fq(x). To this end, the demon 
has to perform a work Wf{y) > Fq — Fi{y) + }V^j.{y) where the irreversible work W^^{y) = — D{Px\Y{-\y)\\^{-\y)) 

is non-zero when the distribution Px\Y{x\y) of the particle immediately after the measurement differs from the 
equilibrium distribution in the potential Fi{x\y) (32). This work Wf{y) performed on the system is to be subtracted 
from the extracted work Wo{x, y) when estimating the net extracted work over a complete cycle, W(x, y) = Wo{x, y) — 
Wf{y). In average, this results in an extracted work satisfying 

/3E[W] </(A;y). (53) 

This inequality becomes an equality if tt = Px\y s-nd the restoration of the original potential is quasi-static, a 
protocol known to be optimal for discrete-feedback thermodynamic engines (33). 

The mapping presented so far is to Kelly’s model, which corresponds to taking two limits in the evolutionary model, 
a limit of perfect selectivity and a limit of no inheritance. We now examine how the analogy may be extended beyond 
these two limits. 


C. Inheritance 

Extensions to include inheritance (better called “memory” in this context) are considered in (23). A direct mapping 
to an evolutionary model with inheritance but perfect selectivity is to assume a multi-step process in which the system 
is brought back to equilibrium in a new potential Fq{x) every time, where Fq{x) differs but is correlated to Fq~^{x). 
A more interesting extension, however, is to consider that the particle does not equilibrate with the thermal bath 
before a new measurement and change of potential are made. Physically, equilibration takes time and instead of 
extracting a maximal work E[yV’], it may be more desirable to extract a maximal power V = E[yV’]/r, where r, the 
time taken by a cycle, may itself be optimized. We present in Appendix H an extension of Eq. (51) to cover such 
non-equilibrium protocols. While the extracted work can still be written with information theoretic quantities, their 
interpretation is complicated by the fact that the state Xt of the system prior to a measurement now depends on 
the series of choices = {fi,... made by the demon. From the standpoint of the evolutionary model, this 

corresponds to a feedback from the state of the population to the state of the environment, a biologically relevant 
phenomenon that could be further studied within the present framework. 


D. Finite selectivity 

A mapping to an evolutionary model with finite selectivity is for instance obtained by assuming a separation of scales 
between a macro-state x, which is measured, and micro-states <j), which are manipulated, with S{(j),x) representing 
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the density of states, i.e., the number of micro-states (j) associated with the macro-state x. In this mapping, the demon 
makes a macroscopic measurement of x but, given the result y, can tune every microscopic energy levels from Eg^cj)) 
to Ei{<j3\y). Assuming that we start and end with the micro-states at equilibrium given their macro-state, Eq. (47) 
becomes Wo{x,y) < Eo{x) — Ei{x\y) where Eo{x) = —ln(^^ S'(^, is the free energy of a system at 

equilibrium in macro-state x, and Ei{x\y) = at equilibrium in the new distribution of 

energy levels. By writing again = TT{(j)\y)e~^^'^^'^\ we obtain 

y^o{x, y) < Eoix) + I3~^ \nC^S((l), x)Tr{(l)\y)) - Fi{y), (54) 

4 > 


and therefore 

E[Wo] <Ex[^o(^)]+r'Ex,y[ln(^5(((.,A)7r((/>|r))]-Ey[Fi(r)], (55) 

<P 


with equality if the energy levels are changed quasi-statically. The conditional probability tt is involved in the last two 
terms of the right-hand side, but, as in the two-state model of Figure 4, we may assume that the demon is constrained 
to Ei{y) = Eq for all y and that the last term is therefore independent of tt. In this case, the problem of choosing 
the energy levels i?i(^|y) so as to optimize the extracted work is formally identical to the problem of optimizing the 
growth rate of an evolutionary model with finite selectivity. 


E. Gaussian systems 

Mapped to its thermodynamical analog, the Gaussian model of evolutionary dynamics becomes the problem of 
controlling a Brownian particle with harmonic potentials. The Gaussian distribution Px{x) = Gcr 2 (x) is indeed 
the equilibrium distribution of a particle in contact with an heat bath at inverse temperature (3 and in a potential 
Vq{x) = kx"^/2 when considering cr^ = In the simplest version of the analogy, a demon observes a particle at 

equilibrium in this potential and measures its location x at y, with a noise characterized by Prixivlx) = G „2 [y — x). 

' y\x 

His problem is then to change the potential to Vi(a:|y) so as to extract a maximal work. 

While changing the stiffness k of the potential may allow the demon to extract more work, the simplest 
scenario is when only translations are allowed, from Vb(x) = kx‘^j2 to Vi(x|y) = k{x — (/)i)^/2, a case where 
Zi{y) = Zo = (27r(T^)^/^, and therefore AF = 0 in Eq. (51). As a consequence of the formal mapping to an 
evolutionary model, the optimal strategy of the demon is to move the potential to (j>i = kyo with k given by Eq. (39), 
i.e., K = 1/(1 + o’^i^./o'^)- The optimal extracted work is the value of acquired information given by Eq. (35) when 

taking E[IF] = (1 — ct^|j^/(t^)/ 2 = (1 -I- cr^|,j,/cr^)“^/2. These expressions corresponds to those obtained by a 

more direct calculation (34). 

If the process is repeated after quasi-statically restoring the potential at a location that is correlated but differs from 
its original location, the problem maps to the Gaussian model of evolutionary dynamics with inheritance. Specifically, 
it corresponds to beginning each cycle t with the particle at equilibrium in Vt{x) = k{x — XtY/2, where Xt = axt-i + vt 
and where vt is normally distributed with variance , . This problem also maps to a problem of stochastic control 

solved by Kalman (22). In Kalman’s model, the state Xt of a system, its measured state yt and its estimated state (j)t 
are assumed to follow the recursions 


Xt = axt-i + vt, ~ A/'(0,ct^^|,j^), (56) 

yt = xt + v't, (57) 

= X(l)t-i + Kyt, (58) 

and the objective is to find the estimation (j)t that minimizes the mean square error E[((/t — XtY] by choosing 
appropriately the two parameters A and k. A standard application is for instance to tracking, where the current 
position and velocity of a target must be estimated from past estimations and from independent measurements. The 
optimal values for A and k are also given by Eq. (39) (as for our model, a generalization to multidimensional variables 
is straightforward). 


A physically more interesting situation is when the particle has no time to equilibrate before a new measurement 
and manipulation are made. The Gaussian setting is here again well-suited for making explicit calculations of the 
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maximal work that may be extracted with such non-equilibrium protocols (35) (see also Appendix H). The results 
obtained for Brownian particles in harmonic potentials suggest that the feedback of a population onto its environment 
could also be studied analytically in Gaussian models of population dynamics. 


V. DISCUSSION 

We reviewed an approach to quantify the value of informations in evolution by analyzing abstract models of 
population dynamics, and showed how analytical expressions can be obtained when considering a particular Gaussian 
limit. This approach illustrates how the value of an information may depend on factors beyond the characteristics of 
the channel that directly conveys it. In particular, it shows how the value of an information acquired from the current 
environment is tied to the value of the information inherited from previous generations. Alternative approaches for 
quantifying information are possible, for instance based on well-chosen sets of axioms (36), but at the risk of omitting 
an important feature of the problem. Although elementary, our model indicates that several constraints should 
generically be taken into account, including causality, selectivity of the environment and individual stochasticity. 
Studies of informations in thermodynamics take a similar approach of analyzing simple models and also find that 
different quantities for quantifying information may arise depending on the protocol (37). Remarkably, a similar 
mathematical formalism emerges from the two problems (23). 

This formal correspondence suggests that methods and concepts may be transferred between disciplines. In (23), 
the authors thus applied the concept of universal strategy from information theory (38) to devise a thermodynamical 
protocol that optimally extracts work when the statistical properties of the system, for instance the characteristics 
of the information channel, are unknown. Reciprocally, many results have been obtained recently in stochastic 
thermodynamics (24) which may provide new insights on evolutionary dynamics. For instance, inequalities on the 
mean extracted work are known to generalize to fluctuation theorems, which take into account fluctuations around 
the mean result and connect macroscopic observations to the underlying time-reversal symmetry of the microscopic 
dynamics. Given the analogy between extracted work and growth rate, similar relations may hold for population 
dynamics. One such fluctuation relation has in fact already been established for evolutionary dynamics by Mustonen 
and Lassig (39), but at a different level of analysis: they considered fluctuations arising from finite population sizes, 
which are ignored in the present analysis of our models. The path integral formalism at the core of their approach 
has, however, its counterpart at our level of analysis (12). 

Another challenge is to move beyond the formal analogy towards an integrated treatment of evolutionary and 
thermodynamical constraints. The presented models account for part of the evolutionary constraints but the infor¬ 
mation processor tt, the sensor C and the “replicator” S are introduced as ad-hoc parameters, with no reference to 
physics or evolution. Several recent studies have investigated thermodynamical constraints on information process¬ 
ing (40), biochemical sensing (41) or replication (42), and others have investigated evolutionary constraints at the 
inter-molecular (43) and intra-molecular (44) levels. Given the interplay between local and global properties that 
simple models already exhibit, integrating these different constraints appears as both necessary and interesting. 
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APPENDICES 

Appendix A: Mapping from Eq. (6) to Eq. (2) 

The model described by Eq. (6) is mapped to the model described by Eq. (2) by defining 

it = (0,Xt), yt = {yt,Zt) (Al) 

and 

7r((^t|0t_i,yt) = (A2) 

where (j>^ corresponds to the k-ih. component of 0t, i.e., (pt = Note that S is of the form S = KA as in 

Eq. (1) if A is itself of the form S = KA. 


Appendix B: Decomposition of the growth rate 


We detail here the decomposition of the growth rate given in Eq. (14) for the discrete model in absence of inheritance. 
The idea is to write 


A = Ex,y[HK{X)^X\Y))] 

= Ex,y[ln(A:(A)] + Ex,F[lnPx(a;)] + Ex,y 


In 


Px\YiX\Y) 

Px{X) 


+ Exv 


In 


^{X\Y) 
Px\y{X\Y)\ 


and to recognize that Ex,y[lnPx(a;)] = —H{X), Ex,y[lnPx|y(A|F)/Px(Al)] = I{X;Y) and 
niX\Y) 


E 


x.y 


In 


^x|y(A|y)J 




Px\Y{x\y) 


= -J 2 PYiy)D{Px\Y{-\y)M-\y)) = -Ey[i?(Fx|y(.|i")l|7f(.|E))]. 


(Bl) 


(B2) 


Appendix C: Gaussian random variables 

A Gaussian random variable X is characterized by its mean xq = E[A] and its variance = E[A^] — E[X]^, and 
its probability density is Px{x) = Ga 2 {x — xq), where Ga 2 {x) = (27r(j^)“^/^ exp(—a:^/2fT^). 

Its differential entropy h{X) = — J dxPx(x) lnPx(a;) is 

h(A) = iln(27rea^). (Cl) 

The mutual information I(X;Y) = h(X) — h{X\Y) between X and another Gaussian random variable Y whose 
conditional probability given x is PY\x{y\x) = G „2 (y — x) is 

I y|x 


I(X;Y) 




(C2) 


The relative entropy between two Gaussian probability densities is 


f a^ + (xi - xo^ 

I 



D(G,2(. - xo)IIG,2j. - xi)) 


1 

2 


(C3) 
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Finally, given Pxi\Xo{xi\xo) = G ^2 (xi - axo) and FVjxi( yila^i) = G „2 {yi -xi), the conditional probability 
Pxi\Y-i,Xoi which by Bayes’ rule is proportional to FVi|Xi-fxi|Xo; i® Gaussian and given by 

-fxi|Fi.Xo(a^i|2/i,a;o) = {xi - Xxq - nyi), (C4) 


'*—- 1,2 / 2 > A — a(l k), ^xi\yi,xo — ^'Xxi\xo^ 

^ + ^Vi\xJ^x^\xo 


or, equivalently, cr. 


^-2 I ^-2 

= cr„ I + a., I 


■2:i\yi,xo xi\xQ^ yi\xi' 


Appendix D: Growth rate of the Gaussian model 

Eq. (30) for the growth rate A of the Gaussian model is obtained by considering 


nt{(j)t) = -^K{xt) j Gai{4>t - xt)Ga 2 {(j)t - - Kj/t)nt(^t-i), 


with nt{4>t) of the form nt{4>t) = G^ 2 {(j)t — mt), which leads to 

”1 = (x“ + (»J + AVt,)-■)-' 

Wt = K{xt)G„ 2 +„ 2 +x^^ 2 _^{\mt-i-Xt +nyt). 

The variance ctj has a fixed point cr^ in terms of which the growth rate can be rewritten as 

A = /hn E[lnm] = A* - 1 H2xal) + 1 In ^ ^ ,1™ ^[z^]. 


where A* = Ex [In A"(AT)], 


zt = Xmt-i - xt + Kyt, 


u2+a2+AV^ " 1 + A2 + /3+((1-A2-^)2 + 4/3 )i/2” 

Given that Xt+i = axt + h and yt+i = Xt+i + 6't+i with bt ~ A/'(0, and ~ 7^(0, we have 

Zt+i = azt + ext + (k — l)bt + with e = A — a(l — k). 

Using J2k=o = J2k=oi^*~'' ~ a^~^)l{a - a)bk, we obtain 


Zt+l — 


1 * 


■t — k ^^t — k 

a — ea 


)&fc + K^a* *^fc+i) with (5 = A —a(l —k), 


and, since the bk and b'f. are all independent, with variances E[&|] = <^l,\xo and E[b'^] = 


lim E [2 


1 f _A 2 I 2 '^yijxi 

{a-ay\l-a^ 1 - aa 1 - a^J i _ ^2 


(A^ + (1 — k)^)(1 + aa) — 2A(1 — K){a + a) 
(1 — a^)(l — aa)(l — a^) 


(T^ I 

2 I ,.2 yi\xi 

xi\xo+ i_c^ 2 ' 


Plugged into Eq. (D5), it leads to Eq. (30). 
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Appendix E: Decomposition of the growth rate of the Gaussian model 

Since the Gaussian model can be obtained as a continuous limit of the discrete model, Eqs. (32)-(37) directly 
result from Eqs. (14)-(22) by taking the same limit. The decomposition can also be derived directly from the general 
formula of Eq. (30) as we illustrate it here in the simplest case where the two limits are taken. 


The first limit, of perfect selectivity, corresponds to a1 —)■ 0, such that Eq. (30) becomes 


A = A*-ln(27r(T^)- - 

2 ^ 2al 


-|- (1 — kY — 2A(1 — k)c 


2 2 2 
'^xx\xo A ^ '^Vi\xi 


1 — a? 

The second limit, of no inheritance, simply corresponds to setting A = 0 in this equation, so that 


A = A* - i ln(27rcr2) - ^ (1 - 


Vl\xi 


(El) 


(E2) 


where — a^) represents the stationary variance of the environmental process, = E[Arj]. The 

optimal strategy tt is obtained by optimizing A over k and cr^, which leads to 

1 


K = 






k can also be written k = where Oy^ = -I- represents the stationary variance of yt- As expected 

from the analysis of the discrete model, we verify that the optimal strategy implements a Bayesian estimation, i.e., 
TT = Px\Y [see Appendix C]. We also verify that the optimal optimal growth rate. 


is equivalently written 


where 


A = A*-iln (27r ] 

A = A* - h{X) + I{X;Y), 


h{X) = - ln( 27 recr^J, 


More generally, by introducing cr^qj,^ = cr, ^ 


1 / o 

and I{X;Y) = -\n IH- ^ 

2 \ a 


(E4) 

(E5) 

(E6) 


yi\xi 


verify that Eq. (E2) is equivalent to 


Vl\xi 


so that (1 - nYal^ + = al^^y^ + {k - we 


^l^^+(k-«)V: 


2^2 
yi 


A = A- 


as expected from Eq. (13). 


Appendix F: Gaussian model with individual sensors 


-In^^-l 


= A-EY[D{PxiYi-\Y)M.\Y))], 


(E7) 


Using the formulae of Appendix C, the term to maximize in Eq. (42) can be written 


I{X;Y) - Ey[D{Px\y\\G, 1+..,1^ {■ - ^Y))] = Y i 1 + ^ ^ 


1 f '^lly + ('^ - '^o) 






-In 




When y -> X, Ko 1, 0, (t^|^ -> 0 and a^^y 0 but crl\y/crl\^ 1 and it simplifies to 

\ («- 


lln 2 I - ^2 2^ +^l 

+ a2+«;2a^l^ 

The maximum over a'^ is reached for = 0 and taking the derivative with respect to k leads to 

+ (k- 1)0-^ = 0 


"T 

(El) 

(F2) 

(F3) 


whose solution is given in Eq. (43). 
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Appendix G: The Gaussian model as a limit of the general model of Ref. (16) 


A general model is defined by Eqs. [S1]-[S2] in the Supporting Information of (16), which we repeat here with only 
slightly modihed notations: 


7o 

= Xoyo + KoZt+<^0(l>0 + l^H, JZ// ~ A/'(0,CT|f), 

(Gl) 

4>o 

= S 0 J 0 + PoVt + t^D, JZD ~ A/'(0,(j|)), 

(G2) 

S{(l)o,xt) 

= exp[rmax - (</o - Xtf/{2al)], 

(G3) 

Xt 

= axt-i + bt, ~ A/'(0,ct^^|,^J, 

(G4) 

yt 

= xt + b{, ~ A/'(0,cry|^), 

(G5) 

Zt 

= xt + b'l, ~ A/'(0,Crf|,,). 

(G 6 ) 



(G7) 


Without loss of generality it can be assumed that = 1. The formula for the growth rate of this general model is 
given with an error in Eq. [S3] of (16). The correct formula is 


A 1 , a 

A — rniax“t”“ In 


2 r] 2 r/(l-a 2 ) [ 


(t>2 + (1 - po)^)(l + aa) - 2i;(l - po)(a + a) 2 , 2 


1 — aa 


where ~ a^)) where 77 and v given by 

V = ^ 0(1 + (^I)) + Wo^O) V = (wq + Ko)0o + (1 — Po)'^0) 


+ Po(l “ 2aAo + Ag)CT^|,j, + 

(G 8 ) 


and 


2A 


a = 


1 + A2 + dl, + ((1 - A2 - d|,)2 + 4d| 


1 / 2 ’ 


with 


— \^H 


2 , ^Q<^D \ 


(T^ + 1 / (Tn + 1 


A = Ao + 4« 


D 


+ 1 


(G9) 


(GIO) 


(Gll) 


These formulae reduce to Eq. (30) when taking 9o = X, po = k, ojq = 1, cr|j = Xq = kq = cr]^ = cr^|^ = 0 and 


rma.x = In AT - (1/2) ln(27rCT2). 


Appendix H: Feedback control out of equilibrium 


The state Xt of a system in contact with a heat bath is measured as yt at regular intervals of time r, upon 
which the potential in which the system evolves is changed from Vt-i{x) to Vt{x). This change is done without 
knowing the current state Xt, but may depend on the history of past measurements = (yi,... ,yt) as well as on 
the history of past states at the time of these measurements, x*~^ = {xi,... ,Xt_i). If we assume that the potential 
is controllable by one or several parameters we therefore consider, in the more general case, that = ({y*,x*'~^) 
[in more constrained cases, it may depend only on some of variables, e.g., it = i{y*') when only the present and 
past measurements are available]. In-between two measurements, the system relaxes in a constant potential Vt{x) 
but may not reach equilibrium; its dynamics is generally stochastic, due to the interaction with the heat bath, 
and may for instance be described by a Master equation with rates satisfying detailed balance. When changing 
the potential from Vt-i{x) to Vt{x), a demon extracts a work Wt = Vt-i(xt) — Vt{xt). The goal of the demon is 
either to optimize the total extracted work Wtot = E[X]i Wt] or, if r itself is controllable, to optimize the power Wtot/T. 


To formalize this problem, we denote by pl_i{xt) the probability of the system to be in state Xt at the time of the t-th 
measurement: this probability depends explicitly only on it-i and Xt-i, which characterize, respectively, the potential 
Vt-i{x) and the state of the system when this potential is switched on. Introducing Ft = —,0“^ Inand 

p^°°\x) = (also denoted tt in the main text), the extracted work may be decomposed as 


Wi(x‘,j/*) 


Vt_i(a;t) 


Vt{xt) = j3 '^\n 


PT{xt) 

Pl-iixt) 


+ r^\n 


Pl-i(xt) 

P^iixt) 


{Ft - Ft-i). 


(HI) 
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We now consider the past history as given and average over {Xt,Yt) to define 

E^Wt] = Ex^,y(|X‘-i=a:*-i,Y*-i=y‘-i F*)]. 

Since Pxt\xt-\Yt-^ixt\x*~^,y*~^) = Pt-iixt), we have 

(3Et[Wt] = -E[l?(<z[_i||pr)] +E[i^(p[_ilb^i)] -/3E[Ft 


where 


and 


qj_i{xt\yt) = Pxtix^-w^xtlx* 


PY\x{yt\xt)pl-i{xt) 

T,xPY\x{yt\x)pUiix) 


I{Xt;Yt\x* PY\xiyt\xt)Pt-i{xt)\n 

xt.yt 


ql_i{xt\yt) 

Pl-li.Xt) 


(H2) 


(H3) 


(H4) 


(H5) 


The total work Wtot is obtained as Wtot = Ejc*-i,y*~i [Et[VV’t]]. When r —>■ oo, the third term on the right-hand 
side of Eq. (H3) vanishes and we recover the equilibrium result, Eq. (51). 


This formalism can be applied to a Brownian particle in a controllable harmonic potential. For simplicity, we 
assume that only the location of the potential can be controlled, and its stiffness k is fixed to k = 1. We also set 
/3 = 1. The potential Vt{x) = {x — £t)'^ 12 is characterized by the location it of its minimum, and Ft = Ft-i for all t. 
We take the relaxation dynamics between measurements to be described by a Fokker-Planck equation, 

drplix) = dx{dxVt{x)pl{x)) + dlpl{x), (H6) 

with the initial condition is p^{x) = S{x — Xt). This equation is easily solved as its solution is Gaussian at all time: 
pI{x) = G^ 2 {x - pI) with 


so that 


\drPl+Pt=it, P°=xt, 

i5rC^-fc^ = l, ^o=0> 


Pt = (1 - e '^)it + e '^xt, 
^2 = i_e-2G 


(H7) 

(H8) 


(H9) 

(HIO) 


When T —)■ oo, pj{x) converges to the equilibrium distribution(x) = Gi{x — it). Using PY\x{yt\xt) = G „2 (yt — Xt) 

' y\x 

and applying Eq. (C4), qj is found to be 


qt-i{xt\yt) =G^ 2 ^Jxt-{1 -k)pI_i- nyt), with k = „ , 

' y\x/^T 

The first term in Eq. (H3) is therefore 

I{Xt-,Yt\x^-\y^-^) = hn(l + 4-] ■ 

\ %|a:/ 

The second term is 

E[D{ql_,\\pr)] = ^ {aX+E[z^] - InaX - l) , 

with 


(Hll) 


(H12) 


(H13) 


zt = {i- K)/i[-i + i^yt - it- 


The third term is 

E[DipU\\pr-i)] = I {^r + E[zf ] - Inc.^ - 1 ) , 


(H14) 


(H15) 
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z't = = e '"{xt-i - (t-i)- 


Given (a;* ^), the only term depending on yt is E[z^] in Eq. (H13). It is optimized by choosing £t so as to have 

Zt = 0 : 

= Kyt + (1 - = Kyt + (1 - k)[{ 1 - e~^)£t-i + e~'"xt-i]. (H17) 

By taking £t-i = £t-i, this defines recursively a series of optimal translations t. 

To express the optimal work, it remains to evaluate for t = Since Xt—tt = (1 — n){xt — m[_i) — i^iyt — Xt) 
where Xt — yl-i yt — Xt are statistically independent, we have 

E[(x, - it?] = (1 - K?? + (H18) 


and therefore ^[z?] = e All together, we obtain 


maxE[W(] = (^al^y - Ina^,^ “ ^ ® “ l) , (H19) 

V yk/ 

which, given that ? = I — and simplifies to max^t E[yV’t] = ^^(1 — cr^|j^)/2, or, in terms of 

T and 0-2 only. 


maxE[Wt] = -(1 - - ((1 - + a-p-?. (H20) 

When T —>■ oo, we recover the equilibrium result, E[yV’t] < I{X;Y) — min,^Ey[D(P x|f(- ~ ^)l|G'i(. — (/)(T)))], with 
I{X-,Y) = [ln(l + l/a2 )]/2 and min^Ey[I?(P;,|y(. - y)||Gi(. - <P{Y)))] = D{G,. ||Gi) = _ ina^ _ l)/2. 



